On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem

نویسندگان

  • Wenliang Du
  • David Eppstein
  • Michael T. Goodrich
  • George S. Lueker
چکیده

We study the problem of abstracting a table of data about individuals so that no selection query can identify fewer than k individuals. As is common in existing work on this k-anonymization problem, the means we investigate to perform this anonymization is to generalize values of quasi-identifying attributes into equivalence classes. Since such data tables are intended for use in data mining, we consider the natural optimization criterion of minimizing the maximum size of any equivalence class, subject to the constraint that each is of size at least k. We show that it is impossible to achieve arbitrarily good polynomial-time approximations for a number of natural variations of the generalization technique, unless P = NP , even when the table has only a single quasi-identifying attribute that represents a geographic or unordered attribute: • Zip-codes: nodes of a planar graph generalized into connected subgraphs • GPS coordinates: points in R generalized into non-overlapping rectangles • Unordered data: text labels that can be grouped arbitrarily. These hard single-attribute instances of generalization problems contrast with the previously known NP-hard instances, which require the number of attributes to be proportional to the number of individual records (the rows of the table). In addition to impossibility results, we provide approximation algorithms for these difficult single-attribute generalization problems, which, of course, apply to multiple-attribute instances with one that is quasi-identifying. We show theoretically and experimentally that our approximation algorithms can come reasonably close to optimal solutions. Incidentally, the generalization problem for unordered data can be viewed as a novel type of bin packing problem—min-max bin covering—which may be of independent interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heuristic and exact algorithms for Generalized Bin Covering Problem

In this paper, we study the Generalized Bin Covering problem. For this problem an exact algorithm is introduced which can nd optimal solution for small scale instances. To nd a solution near optimal for large scale instances, a heuristic algorithm has been proposed. By computational experiments, the eciency of the heuristic algorithm is assessed.

متن کامل

A goal geometric programming problem (G2P2) with logarithmic deviational variables and its applications on two industrial problems

A very useful multi-objective technique is goal programming. There are many methodologies of goal programming such as weighted goal programming, min-max goal programming, and lexicographic goal programming. In this paper, weighted goal programming is reformulated as goal programming with logarithmic deviation variables. Here, a comparison of the proposed method and goal programming with weighte...

متن کامل

On Approximability of Linear Ordering and Related NP-optimization Problems on Graphs (Extended Abstract)

We investigate the approximability of minimum and maximum linear ordering problems (MIN-LOP and MAX-LOP) and related feedback set problems such as maximum weight acyclic subdiagraph (MAX-W-SUBDAG), minimum weight feedback arc/vertex set (MIN-W-FAS/ MIN-W-FVS) and a generalization of the latter called MIN-Subset-FAS/MIN-Subset-FVS. MAX-LOP and the other problems have been studied by many researc...

متن کامل

Approximability of Virtual Machine Allocation: Much Harder than Bin Packing

The allocation of virtual machines (VMs) to physical machines in data centers is a key optimization problem for cloud service providers. It is well known that the VM allocation problem contains the classic bin packing problem as special case. This paper investigates to what extent the existing approximability results on bin packing and its generalizations can be applied to the VM allocation pro...

متن کامل

A Structural Lemma in 2-Dimensional Packing, and Its Implications on Approximability

We present a new lemma stating that, given an arbitrary packing of a set of rectangles into a larger rectangle, a “structured” packing of nearly the same set of rectangles exists. This lemma has several implications on the approximability of 2-dimensional packing problems. In this paper, we use it to show the existence of a polynomial-time approximation scheme for 2-dimensional geometric knapsa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009